Using Bayesian Statistics to Predict Toxic Response Based on Gene Expression Values
نویسندگان
چکیده
In the emerging field of toxicogenomics, microarray technology is used to read the level of gene expression in cells subjected to various toxic and non toxic agents. The amount of levels that are read from one single array is in the thousands, and when the number of arrays increases, statistical methods are needed to process the information. This master’s thesis documents the use of Bayesian statistics to create two versions of the Bayesian Classifier, both trained to make the distinction between treatment by toxic and non toxic agents based on Affymetrix array data. The classifiers are optimised with regard to the number of expression levels used, and performance is measured by performing various types of cross validation. The first classifier implemented is the Naive Bayesian Classifier, sometimes unjustly called the "idiot’s Bayes". It is proven that the naive classifier works surprisingly well on the noisy and many dimensional array data, in spite of its somewhat naive assumption that all dimensions of the data space are governed by independent probability distributions. The second version of the classifier, which I have chosen to call the "Non-Naive Bayesian Classifier", is designed to take into consideration the dependencies that do exist between the probability distributions of different dimensions of the data space. It does this by creating a graph over the dependencies in the data, based on Mutual Information. However, the Non-Naive Classifier proves to be a disappointment when trying to classify the array data, only increasing the success rates by a few percent. Apart from being used for mere classification, the classifiers are also used to mine the toxicological data, finding relationships between agents and between gene expressions. Clustering finds that some of the agents used belong to a special subgroup, being hard to classify with either version of the classifier. Användning av Bayesisk statistik för att förutsäga toxiskt svar baserat på genuttryck
منابع مشابه
Comparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches
This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...
متن کاملPrediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods
Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray ...
متن کاملGenetic Programming Based Formulation to Predict Compressive Strength of High Strength Concrete
This study introduces, two models based on Gene Expression Programming (GEP) to predict compressive strength of high strength concrete (HSC). Composition of HSC was assumed simplified, as a mixture of six components (cement, silica fume, super-plastisizer, water, fine aggregate and coarse aggregate). The 28-day compressive strength value was considered the target of the prediction. Data on 159...
متن کاملBayesian Inference for Spatial Beta Generalized Linear Mixed Models
In some applications, the response variable assumes values in the unit interval. The standard linear regression model is not appropriate for modelling this type of data because the normality assumption is not met. Alternatively, the beta regression model has been introduced to analyze such observations. A beta distribution represents a flexible density family on (0, 1) interval that covers symm...
متن کاملBayesian Sample size Determination for Longitudinal Studies with Continuous Response using Marginal Models
Introduction Longitudinal study designs are common in a lot of scientific researches, especially in medical, social and economic sciences. The reason is that longitudinal studies allow researchers to measure changes of each individual over time and often have higher statistical power than cross-sectional studies. Choosing an appropriate sample size is a crucial step in a successful study. A st...
متن کامل